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Aim of the study: To establish and 
evaluate the fingerprint diagnostic 
models of cerebrospinal protein pro- 
file in glioma with surface-enhanced 
laser desorption/ionization time-of- 
flight mass spectrometry (SELDI-TOF- 
MS) and bioinformatics analysis, in 
order to seek new tumor markers. 
Material and methods: SELDI-TOF-MS 
was used to detect the cerebrospinal 
protein bond to ProteinChip H4. The 
cerebrospinal protein profiles were 
obtained and analyzed using the ar- 
tificial neural network (ANN) meth- 
od. Fingerprint diagnostic models of 
cerebrospinal protein profiles for dis- 
tinguishing glioma from non-brain-tu- 
mor, and distinguishing glioma from 
benign brain tumor, were established. 
The support vector machine (SVM) 
algorithm was used for verification 
of established diagnostic models. The 
tumor markers were screened. 
Results: In a fingerprint diagnostic 
model of cerebrospinal protein pro- 
files for distinguishing glioma from 
non-brain tumor, the sensitivity and 
specificity of glioma diagnosis were 
100% and 91.7%, respectively. Sev- 
en candidate tumor markers were 
obtained. In a fingerprint diagnostic 
model for distinguishing glioma from 
benign brain tumor, the sensitivity 
and specificity of glioma diagnosis 
were 88.9% and 100%, respectively, 
and 8 candidate tumor markers were 
gained. 

Conclusions: The combination of SELDI- 
TOF-MS and bioinformatics tools is 
a very effective method for screen- 
ing and identifying new markers of 
glioma. The established diagnostic 
models have provided a new way for 
clinical diagnosis of glioma, especially 
for qualitative diagnosis. 

Key words: glioma, cerebrospinal 
fluid, SELDI-TOF-MS, artificial neural 
network, support vector machine, di- 
agnostic model, tumor markers. 
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Introduction 

Glioma is a space-occupying lesion seriously endangering human health, 
with a high incidence in brain tumors. It is often a malignancy and has the 
greatest perniciousness [1], In clinical practice, it is sometimes difficult to 
preoperatively distinguish glioma from other brain tumors, even if using 
modern imaging technologies. At present, the diagnosis of glioma lacks 
a tumor marker with effective clinical value. Therefore, seeking new markers 
of glioma and improving the clinical diagnosis level have been a hotspot in 
brain tumor research. 

Surface-enhanced laser desorption/ionization time-of-flight mass spec- 
trometry (SELDI-TOF-MS) is one of the most effective proteomics platforms 
for detecting protein profiles [2]. The basic principles of SELDI-TOF-MS are as 
follows: The surface-enhanced proteins are captured by a specific probe, and 
then are bonded by a microarray of protein biochip. Different proteins are 
separated according to the peak value formed by the mass and charge ratio 
(m/z). Each protein obtains one mass spectrum. Then these data are collect- 
ed and analyzed using appropriate software. In recent years, this method 
has become one of the main means of finding new tumor markers in the 
proteomics platform [3], The analysis software based on artificial neural 
network (ANN) technique has been successfully applied for analyzing and 
processing complex data in proteomics [4]. 

In our early studies, mass spectrometry and bioinformatics analysis were 
used for studying the serum samples of common tumors including glioma, 
colorectal cancer, esophageal cancer, breast cancer and ovarian cancer, and we 
obtained certain results. The specificity of glioma site and blood-brain barrier re- 
strict the application of clinical blood examination. As cerebrospinal fluid directly 
contacts the brain tissue, the protein profiles in brain tumor can be directly re- 
flected in cerebrospinal fluid. In this study, the SELDI-TOF-MS platform was used 
to detect the cerebrospinal protein profiles for glioma, and ANN was used for 
biological analysis of the collected data. Fingerprint diagnostic models of cere- 
brospinal protein profiles for distinguishing glioma from non-brain-tumor, and 
distinguishing glioma from benign brain tumor were established. The support 
vector machine (SVM) algorithm was used for evaluation of established diag- 
nostic models. The candidate tumor markers were screened. The objective was 
to seek new ways and methods for clinical diagnosis of glioma. 

Material and methods 

General data 

Cerebrospinal fluid samples of glioma and benign brain tumor were collected 
in the period June 2009 to December 2011 in the Department of Neurosurgery, 
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Second Affiliated Hospital of Zhejiang University (Hangzhou, 
China). All samples were collected by preoperative lumbar 
puncture (inserting needle between the 3 rd and 4 th lumbar 
or the 4 th and 5 th lumbar, or between the 5th lumbar and 
1st sacral vertebrae). The postoperative histopathological di- 
agnosis was conducted on all cases. The cerebrospinal fluid 
was centrifuged at 5000 rpm for 5 min, then stored at-20°C 
for use. The sample constitutions were as follows: 22 cases 
(12 males and 10 females) were with glioma (grade 1 and 2, 
15 cases; grade 3 and 4, 7 cases). The patients' ages were 28 
to 71 years, with median age of 47.3 years. Twenty-five cases 
were with benign brain tumor, including 14 cases of benign 
meningioma (8 males and 6 females, median age 53.1 years), 
6 cases of cerebral schwannoma, 4 cases of cerebral aneu- 
rysm, and 1 case of cerebral cholesteatoma. 

Twenty-eight non-brain-tumor patients (17 males and 
11 females) with mild traumatic brain injury (according to 
GCS standards) were from the Department of Neurosur- 
gery, Shaoxing First People's Hospital (China). The cerebro- 
spinal fluid was collected by lumbar puncture, excluding 
blood contamination. Their ages were 19-70 years, with 
median age of 48.8 years. 

Instruments and analysis software 

PBS-II SELDI-TOF-MS platform and ProteinChip H4were 
provided by Ciphergen Biosystems, Inc. (USA). ANN in the 
MATLAB platform was used for biological analysis of the 
collected data, and SVM was used for verification. 

Operation of SELDI-TOF-MS 

The cerebrospinal fluid samples were unfrozen in an ice 
bath, followed by centrifugation at 5000 rpm for 5 min. 
The protein concentration of each sample was detect- 
ed using the BIO-RAD DC protein assay kit. The range of 
protein concentration was from 0.03305 to 6.85031 mg/ 
ml. 60 ul of sample (< 0.50000 mg/ml), 40 ul of sample 
(0.50000-1.00000 mg/ml), 35 ul of sample (1.00000- 
3.00000 mg/ml) and 30 \x\ of sample (> 3.00000 mg/ml) 
were added to each well of a 96-well plate, respectively. 
Then an equal amount of 0.5% CHAPS buffer (pH 7.4) was 
added for balance, followed by adding 20 uM HEPES buffer 
to adjust the total volume to 160 ul. 

ProteinChip H4 was fixed to the Bioprocessor, and was 
previously balanced with 100 ul of 20 uM HEPES buffer (pH 
7.05) 3 times, for 5 min each time. The above steps were 
conducted on ice (4°C). The treated sample was added to 
each well of the Bioprocessor, followed by centrifugation 
(250 rpm, 4°C) for 1 h to remove the unconjugated pro- 
tein residue. Then the sample was washed with 100 ul of 
20 uM HEPES 3 times (5 min each time), followed by wash- 
ing with 100 |il of deionized water 2 times (for 1 min each 
time). The protein chip was unloaded, and naturally dried. 
Then 0.5 pi of CHCA was added to each well, followed by 
natural drying. These operations were repeated 2 times. 
Finally the samples were detected by SELDI-TOF-MS. 

Data collection and processing 

Proteinchip Software 3.0 was used to collect and process 
data in conditions as follows: laser intensity, 140; sensitiv- 



ity, 9; optimal range of data, 2000-20 000 Da; collecting 
position, 20-80; 5 collections for each position, total 65 col- 
lections. A standard protein chip was used to adjust the ap- 
paratus before collecting data. 2000 to 30 000 m/z peaks 
were firstly filtered with signal-to-noise ratio (s/n) > 5, 
and then secondly filtered with S/N > 2. The screened 
m/z peaks existed in more than 10% samples, and the 
deviation of one peak value in different samples was less 
than 0.3%. The noise of original data was removed. ANN 
and SVM in the MATLAB platform were used to establish 
diagnostic models in the noise removed training set for 
distinguishing different groups. 

Bioinformatics analysis and grouping 

Artificial neural network based on a back propagation 
(BP) algorithm was used for data analysis and establish- 
ment of the diagnostic model. Comparisons between glio- 
ma and non-brain-tumor, and between glioma and benign 
brain tumor were conducted. 2/3 of total samples were se- 
lected as the training set to establish the diagnostic mod- 
el. 1/3 of total samples were selected as the test set for the 
blind test. The initial screened m/z peaks were arranged 
from small to large, according to P values, and were in- 
put from small to large to train the established models. 
When the sensitivity and specificity no longer increased, 
this model was defined as the final diagnostic model. At 
the same time, the SVM algorithm was applied to verify 
the established models, using screened candidate tumor 
markers. 

Results 

Fingerprint diagnostic model of cerebrospinal 
protein profiles for distinguishing glioma from 
non-brain tumor 

In order to find potential markers for distinguishing gli- 
oma from non-brain tumor, comparison was conducted 
between 22 protein profiles in glioma and 28 protein pro- 
files in non-brain tumor. 65 536 m/z peaks were collected, 
and 103 m/z peaks were selected by clustering and peak 
value analysis. Then 7 m/z peaks (6089.602, 7154.886, 
6055.822, 7291.292, 16021.94, 18756.25 and 7960.945) 
were obtained using ANN. These markers composed the 
optimal set and were used as the input variables of ANN 
and the final basis for classification (Figs. 1 and 2). 

Thirty-three samples were selected as the training set 
to build a diagnostic model using ANN, and another 17 
cases were selected for the blind test. The overall classifi- 
cation accuracy rate of the training set was 100% (33/33), 
and the accuracy rate of the test set was 94.1% (16/17). 
The sensitivity of the blind test was 100% (5/5), with 
specificity of 91.7% (11/12). The positive predictive rate 
was 83.3% (5/6), with a negative predictive rate of 100% 
(12/12). Seven markers were used to verify this model by 
SVM algorithm. The accuracy rate, sensitivity, specificity, 
positive predictive rate and negative predictive rate were 
94.1% (16/17), 85.7% (6/7), 100% (10/10), 100% (7/7), and 
90.9% (10/11), respectively (Tables 1 and 2). 
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Fig. 1. Spectra and gel views of marker with 7291.29 m/z (left, MS; right, pseudo-gel; upper three spectra, gliomas; lower three spectra, 
non-brain tumors) 
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Fig. 2. Distributions of glioma and non-brain-tumor in ANN (predic- 
tive value > 0.5, glioma; predictive value < 0.5, non-brain-tumor, only 
one case of non-brain-tumor was misjudged as glioma) 



Fingerprint diagnostic model of cerebrospinal 
protein profiles for distinguishing glioma from 
benign brain tumor 

Twenty-two protein profiles in glioma and 25 protein pro- 
files in benign brain tumor were compared. Two hundred 
forty four preliminarily screened m/z peaks were analyzed 
by ANN, and a total of 47 samples were randomly divided 
into the training set (31 cases) and the test set (16 cas- 
es). After automatic optimization, 8 m/z peaks (3449.645, 
7300.375, 16010.23, 6380.50, 8675.707, 3408.88, 17670.94 
and 20238.78) were finally selected as markers. Results are 



shown in Figures 3 and 4. The accuracy rate, sensitivity, 
specificity, positive predictive rate, and negative predictive 
rate were 93.8% (15/16), 88.9% (8/9), 100% (7/7), 100% 
(9/9) and 87.5% (7/8), respectively. Support vector machine 
was used to verify these 8 markers. The accuracy rate, sen- 
sitivity, specificity, positive predictive rate, and negative 
predictive rate were 93.8% (15/16), 100% (7/7), 88.9% (8/9), 
87.5% (7/8) and 100% (9/9), respectively (Tables 3 and 4). 

Discussion 

Brain tumors are diseases commonly occurring in ado- 
lescents. According to the survey of NCI, the mortality of 
brain tumor in 2000 has the second place of tumors for 
adolescents in the USA, with a cure rate of around 30% [6]. 
Glioma is a brain tumor with the highest morbidity and 
greatest danger. Improving the preoperative diagnosis lev- 
el for better prognosis has become one of the hotspots in 
current medical research. With the development of imag- 
ing technologies, the early diagnosis of glioma has made 
great progress. However, there is still no effective means 
for preoperative differential and qualitative diagnosis of 
glioma. The lack of specific biological markers is the main 
reason for this situation [7]. Tumor is a multiple-gene and 
multiple-step evolutionary process with interactions of 
internal and external factors. The tumor markers should 
be associated with a variety of proteins. A single protein 
marker could not really reflect the tumor protein expres- 
sion. It is difficult for previous molecular biological tech- 
nologies to complete the simultaneous detection of mul- 
tiple proteins. 

In recent years, the rapid development of proteomics has 
provided a new technical platform for seeking tumor mark- 



Table 1. Sensitivity, specificity and accuracy rate for distinguishing Table 2. Sensitivity, specificity and accuracy rate for distinguishing 
glioma from non-brain tumor (ANN) glioma from non-brain-tumor (SVM) 



Tumor 
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Sensitivity 
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100 (5/5) 


8.3 (1/12) 


100 


non-brain- 
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12 
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91.7 (11/12) 


91.7 


total 


17 
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100 


94.1 
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non-brain- 
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7 
10 
17 
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85.7 (6/7) 0 



14.3 (1/7) 100 (10/10) 

100 100 
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ers composed of multiple proteins. With the appearance of 
SELDI-MS, serum tumor markers of prostate cancer, breast 
cancer, ovarian cancer, lung cancer, colorectal cancer and 
liver cancer have been found, with sensitivity and specificity 
higher than previous biological markers [8-13]. SELDI-MS is 
a protein chip technology platform invented by Ciphergen 
Biosystems Inc. (USA), based on studies of Hutchens and 
Yip [14]. This technology has provided a methodological rev- 
olution in the field of proteomics [15], and has the advan- 
tages of small amount of samples, simple operation, high 
sensitivity and high throughput, which previous technolo- 
gies, including liquid chromatography/mass spectrometry 
(LC-MS), two-dimensional gel electrophoresis-mass spec- 
trometry (2-DE-MS), enzyme-linked immunosorbent assay 
(ELISA) and the fluorescent labeling method, lack [16]. SEL- 
DI-MS can detect trace protein at the fmol (1CT 15 mol) lev- 
el, and obtain hundreds of thousands of protein data from 
one sample. It overcomes many difficulties of traditional 
two-dimensional gel electrophoresis, including separation 
of membrane protein, separation of strong acidic and ba- 
sic protein, and detection of low molecular weight and low 
abundance proteins. As a rapid, reproducible, highly sensi- 
tive, easily adoptable means of analysis and diagnostic tool, 
it has provided an effective technology platform for screen- 
ing and identification of tumor markers [17]. In addition, this 
method can obtain a huge amount of data, which is often 
difficult for traditional data processing methods. Therefore, 
bioinformatics techniques are indispensable in data analy- 
sis and processing. 

Artificial neural network based on the BP algorithm is 
a rapidly developed interdisciplinary subject composed 
of neuroscience, computer science, information science, 
and engineering science. It has the advantages of unique 
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Fig. 4. Distributions of glioma and benign brain tumor in ANN (pre- 
dictive value > 0.5, glioma; predictive value < 0.5, brain-tumor, only 
one case of glioma was misjudged as benign brain tumor) 



information storage way, good fault tolerance, large scale 
parallel processing, and strong ability of self-organiza- 
tion, self-learning and self-adapting, and has been used 
in fields of signal processing, pattern recognition, and 
prediction, with a wide application prospect [18]. The BP 
network, proposed by Rumelhart and McClelland in 1986, 
is a multilayer feedforward network based on back-prop- 
agation algorithm. Artificial neural network is a nonlinear 
dynamic system [19], of which the basic unit is the neuron. 
Each neuron connects with other neurons through weight 
value, accepts the output of other neurons, and acts with 
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other neurons by transformation of self-conversion func- 
tion and threshold output. Artificial neural network is com- 
posed of several neurons with a single function by parallel 
distribution. In the BP algorithm, the training samples are 
afforded with an initial weight value. The information is 
input from the input layer. After processing in the hidden 
layer, it is transmitted to the output layer. After further 
treatment by output layer neurons, the results are ob- 
tained. This is as a forward process. If the desired output is 
not obtained, it transfers to the backward process in which 
the information flow is the reverse of the forward process. 
The interlayer connecting weight values are layer-by-layer 
adjusted. And then the backward process transfers to the 
forward process, until the error between the actual output 
and the expected output reaches an acceptable level. Ar- 
tificial neural network is a bioinformatics algorithm most 
widely developed and applied in recent years. Support vec- 
tor machine is a new classification technique proposed by 
Vapnik et al. in 1995 [20], It is also a learning algorithm 
based on statistical learning theory, with a principle dif- 
ferent from ANN. It provides a new algorithm for a learn- 
ing machine, according to the principle of structural risk 
minimization. This technique can overcome large sample 
requirements of other algorithms, especially suitable for 
small samples, and can avoid the over-learning of ANN. 
Hence it has drawn more and more attention. 

In this study, SVM was used to verify the ANN results. 
The principle of SVM is completely different from ANN, 
but the results of the two methods are very similar. This 
has verified the reliability of ANN results to some extent. 
However, the ANN results are comparatively stable. The 
sensitivity and specificity of most samples are more than 
85%. So the ANN results are selected as the final results. In 
data collection and processing, the noise of original data 
is removed by two filtrations. A standard protein chip is 
used to adjust the apparatus before collecting data; this 
can minimize the deviation. 

As cerebrospinal fluid directly contacts with the brain 
tumor, brain tumor markers are most likely detected in 
cerebrospinal fluid. In addition, the direct detection of ce- 
rebrospinal fluid also avoids the possible influence of the 
blood-brain barrier. Due to the difficulty of obtaining a ce- 
rebrospinal fluid sample in a normal person, nearly normal 
persons with mild traumatic brain injury (accordingto GCS 
standards) are selected as a control (non-brain-tumor), in 
which the cerebrospinal fluid is normal and non-hemor- 
rhagic. Benign brain tumors include meningioma, neuri- 
lemmoma, hemangioma and cholesteatoma. As pituitary 
adenoma is a type of tumor with special secretory func- 
tion, and the secreted protein might mask the detection of 
tumor markers, it is not included in benign brain tumors. 

There are many reports about using MS and bioinfor- 
matics analysis to find a tumor marker of glioma. Howev- 
er, seeking brain tumor markers in cerebrospinal fluid by 
these methods is less often reported [21, 22], In this study, 
fingerprint diagnostic models of cerebrospinal protein 
profiles for distinguishing glioma from non-brain-tumor, 
and distinguishing glioma from benign brain tumor, were 
established. The diagnostic models employ both cross-val- 
idation and a double blind test, and the sensitivity and 



specificity are over 85%. They are obviously superior to 
the previous single biomarker, and possess great poten- 
tial application value in clinical practice. However, it is still 
controversial to screen potential tumor markers by SELDI- 
TOF-MS, due to the instable repeatability of results. The 
reason might be that the standards of experimental oper- 
ation (sample processing, species of energy molecule, and 
correction of standard protein molecule) are not uniform. 
The established MS models are not applicable in different 
research groups, and the results from different analysis 
software are not the same. The strategies of overcoming 
these drawbacks include standardization of operating 
method, confirmation of different software and repeated 
verification of established models. In addition, as cerebro- 
spinal fluid samples are not readily available, the compar- 
isons between glioma with different grades, and between 
glioma and other malignanttumors, could not be conduct- 
ed. Expansion of the sample size is required in subsequent 
research. Not all markers selected only accordingto p val- 
ue are proteins with biological significance. Therefore, fur- 
ther separation and identification might be necessary. In 
the next study, one or several of the most valuable tumor 
markers will be screened by comparison using a network 
protein database, for further separation and identification. 
This can further clarify the biological functions of tumor 
markers in glioma. Related research is in progress. In ad- 
dition, whether the protein profiles in cerebrospinal fluid 
also exist in serum, and whether there is a difference if so, 
will be investigated next. 

In conclusion, the combination of SELDI-TOF-MS and 
bioinformatics analysis is a very effective method for 
screening and identifying new markers of glioma. The 
established diagnostic models have provided a new way 
for clinical diagnosis of glioma, especially for qualitative 
diagnosis, but the problems such as poor reproducibility 
should be solved as soon as possible. 
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